In this example, we are going to learn an about an alternative method to encode text data known as word embeddings. This is an incomplete tutorial on word embeddings but will at least give you the basic understanding on when and why we use them.

Learning objectives:

Requirements

# Initialize package
library(keras)
library(fs)
library(tidyverse)
library(glue)
library(progress)

# helper functions we'll use to explore word embeddings
source("helper_functions.R")

The “real” IMBD dataset

So far, we’ve been using the built-in IMBD dataset. Here, we are going to use the original data files which can be found at http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz. We have already downloaded this data for you.

imdb_dir <- here::here("docs", "data", "imdb")
fs::dir_tree(imdb_dir, type = "directory")
/Users/b294776/Desktop/Workspace/Training/rstudio-conf-2020/dl-keras-tf/docs/data/imdb
├── test
│   ├── neg
│   └── pos
└── train
    ├── neg
    └── pos

You can see the data have already been separated into test vs training sets and positive vs negative sets. The actual reviews are contained in individual .txt files. We can use this structure to our advantage - the below iterates over each review and

  1. creates the path to each individual review file,
  2. creats a label based on the “neg” or “pos” folder the review is in,
  3. and saves the output as a data frame with each review on an individual row.
training_files <- file.path(imdb_dir, "train") %>%
  dir_ls() %>%
  map(dir_ls) %>%
  set_names(basename) %>%
  plyr::ldply(data_frame) %>%
  set_names(c("label", "path"))

training_files

We can see our response observations are balanced:

count(training_files, label)

We can now iterate over each row and

  1. save the label in a label vector,
  2. import the movie review and
  3. save in a texts vector.
obs <- nrow(training_files)
labels <- vector(mode = "integer", length = obs)
texts <- vector(mode = "character", length = obs)

# this just allows us to track progress of our loop
pb <- progress_bar$new(total = obs, width = 60)

for (file in seq_len(obs)) {
  pb$tick()
  
  label <- training_files[[file, "label"]]
  path <- training_files[[file, "path"]]
  
  labels[file] <- ifelse(label == "neg", 0, 1)
  texts[file] <- readChar(path, nchars = file.size(path)) 
  
}

[>----------------------------------------------------]   1%
[>----------------------------------------------------]   2%
[>----------------------------------------------------]   3%
[=>---------------------------------------------------]   3%
[=>---------------------------------------------------]   4%
[=>---------------------------------------------------]   5%
[==>--------------------------------------------------]   5%
[==>--------------------------------------------------]   6%
[==>--------------------------------------------------]   7%
[===>-------------------------------------------------]   7%
[===>-------------------------------------------------]   8%
[====>------------------------------------------------]   8%
[====>------------------------------------------------]   9%
[====>------------------------------------------------]  10%
[=====>-----------------------------------------------]  10%
[=====>-----------------------------------------------]  11%
[=====>-----------------------------------------------]  12%
[======>----------------------------------------------]  12%
[======>----------------------------------------------]  13%
[======>----------------------------------------------]  14%
[=======>---------------------------------------------]  14%
[=======>---------------------------------------------]  15%
[=======>---------------------------------------------]  16%
[========>--------------------------------------------]  16%
[========>--------------------------------------------]  17%
[========>--------------------------------------------]  18%
[=========>-------------------------------------------]  18%
[=========>-------------------------------------------]  19%
[=========>-------------------------------------------]  20%
[==========>------------------------------------------]  20%
[==========>------------------------------------------]  21%
[==========>------------------------------------------]  22%
[===========>-----------------------------------------]  22%
[===========>-----------------------------------------]  23%
[===========>-----------------------------------------]  24%
[============>----------------------------------------]  24%
[============>----------------------------------------]  25%
[=============>---------------------------------------]  25%
[=============>---------------------------------------]  26%
[=============>---------------------------------------]  27%
[==============>--------------------------------------]  27%
[==============>--------------------------------------]  28%
[==============>--------------------------------------]  29%
[===============>-------------------------------------]  29%
[===============>-------------------------------------]  30%
[===============>-------------------------------------]  31%
[================>------------------------------------]  31%
[================>------------------------------------]  32%
[================>------------------------------------]  33%
[=================>-----------------------------------]  33%
[=================>-----------------------------------]  34%
[=================>-----------------------------------]  35%
[==================>----------------------------------]  35%
[==================>----------------------------------]  36%
[==================>----------------------------------]  37%
[===================>---------------------------------]  37%
[===================>---------------------------------]  38%
[===================>---------------------------------]  39%
[====================>--------------------------------]  39%
[====================>--------------------------------]  40%
[====================>--------------------------------]  41%
[=====================>-------------------------------]  41%
[=====================>-------------------------------]  42%
[======================>------------------------------]  42%
[======================>------------------------------]  43%
[======================>------------------------------]  44%
[=======================>-----------------------------]  44%
[=======================>-----------------------------]  45%
[=======================>-----------------------------]  46%
[========================>----------------------------]  46%
[========================>----------------------------]  47%
[========================>----------------------------]  48%
[=========================>---------------------------]  48%
[=========================>---------------------------]  49%
[=========================>---------------------------]  50%
[==========================>--------------------------]  50%
[==========================>--------------------------]  51%
[==========================>--------------------------]  52%
[===========================>-------------------------]  52%
[===========================>-------------------------]  53%
[===========================>-------------------------]  54%
[============================>------------------------]  54%
[============================>------------------------]  55%
[============================>------------------------]  56%
[=============================>-----------------------]  56%
[=============================>-----------------------]  57%
[=============================>-----------------------]  58%
[==============================>----------------------]  58%
[==============================>----------------------]  59%
[===============================>---------------------]  59%
[===============================>---------------------]  60%
[===============================>---------------------]  61%
[================================>--------------------]  61%
[================================>--------------------]  62%
[================================>--------------------]  63%
[=================================>-------------------]  63%
[=================================>-------------------]  64%
[=================================>-------------------]  65%
[==================================>------------------]  65%
[==================================>------------------]  66%
[==================================>------------------]  67%
[===================================>-----------------]  67%
[===================================>-----------------]  68%
[===================================>-----------------]  69%
[====================================>----------------]  69%
[====================================>----------------]  70%
[====================================>----------------]  71%
[=====================================>---------------]  71%
[=====================================>---------------]  72%
[=====================================>---------------]  73%
[======================================>--------------]  73%
[======================================>--------------]  74%
[======================================>--------------]  75%
[=======================================>-------------]  75%
[=======================================>-------------]  76%
[========================================>------------]  76%
[========================================>------------]  77%
[========================================>------------]  78%
[=========================================>-----------]  78%
[=========================================>-----------]  79%
[=========================================>-----------]  80%
[==========================================>----------]  80%
[==========================================>----------]  81%
[==========================================>----------]  82%
[===========================================>---------]  82%
[===========================================>---------]  83%
[===========================================>---------]  84%
[============================================>--------]  84%
[============================================>--------]  85%
[============================================>--------]  86%
[=============================================>-------]  86%
[=============================================>-------]  87%
[=============================================>-------]  88%
[==============================================>------]  88%
[==============================================>------]  89%
[==============================================>------]  90%
[===============================================>-----]  90%
[===============================================>-----]  91%
[===============================================>-----]  92%
[================================================>----]  92%
[================================================>----]  93%
[=================================================>---]  93%
[=================================================>---]  94%
[=================================================>---]  95%
[==================================================>--]  95%
[==================================================>--]  96%
[==================================================>--]  97%
[===================================================>-]  97%
[===================================================>-]  98%
[===================================================>-]  99%
[====================================================>]  99%
[====================================================>] 100%
[=====================================================] 100%
                                                            

We now have two vectors, one consisting of the labels and the other holding each review.

table(labels)
labels
    0     1 
12500 12500 
cat("\n")
texts[1]
[1] "Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. Even those from the era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader. On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly."

Exploratory text analysis

A little exploratory analysis will show us the total number of unique words across our corpus and the average length of each review.

text_df <- texts %>%
  tibble(.name_repair = ~ "text") %>%
  mutate(text_length = str_count(text, "\\w+"))

unique_words <- text_df %>%
  tidytext::unnest_tokens(word, text) %>%
  pull(word) %>%
  n_distinct()

avg_review_length <- median(text_df$text_length, na.rm = TRUE)
  
ggplot(text_df, aes(text_length)) +
  geom_histogram(bins = 100, fill = "grey70", color = "grey40") +
  geom_vline(xintercept = avg_review_length, color = "red", lty = "dashed") +
  scale_x_log10("# words") +
  ggtitle(glue("Median review length is {avg_review_length} words"),
          subtitle = glue("Total number of unique words is {unique_words}"))

Word embeddings for language modeling

Word embeddings are designed to encode general semantic relationships which can serve two principle purposes. The first is for language modeling which aims to encode words for the purpose of predicting synonyms, sentence completion, and word relationships.

See slides for more discussion of this type of modeling. We are not focusing on word embeddings for this purpose; however, I have written a couple helper functions to train word embeddings for this purpose. See the code behind these helper functions here.

# clean up text and compute word embeddings
clean_text <- tolower(texts) %>%
  str_replace_all(pattern = "[[:punct:] ]+", replacement = " ") %>%
  str_trim()

word_embeddings <- get_embeddings(clean_text)
Creating vocabulary...
Creating term-co-occurence matrix...
Computing embeddings based on GloVe algorithm...
INFO [2019-11-19 07:11:24] 2019-11-19 07:11:24 - epoch 1, expected cost 0.0826
INFO [2019-11-19 07:11:25] 2019-11-19 07:11:25 - epoch 2, expected cost 0.0555
INFO [2019-11-19 07:11:26] 2019-11-19 07:11:26 - epoch 3, expected cost 0.0485
INFO [2019-11-19 07:11:27] 2019-11-19 07:11:27 - epoch 4, expected cost 0.0443
INFO [2019-11-19 07:11:28] 2019-11-19 07:11:28 - epoch 5, expected cost 0.0415
INFO [2019-11-19 07:11:29] 2019-11-19 07:11:29 - epoch 6, expected cost 0.0395
INFO [2019-11-19 07:11:30] 2019-11-19 07:11:30 - epoch 7, expected cost 0.0379
INFO [2019-11-19 07:11:31] 2019-11-19 07:11:31 - epoch 8, expected cost 0.0367
INFO [2019-11-19 07:11:33] 2019-11-19 07:11:33 - epoch 9, expected cost 0.0357
INFO [2019-11-19 07:11:34] 2019-11-19 07:11:34 - epoch 10, expected cost 0.0348
INFO [2019-11-19 07:11:35] 2019-11-19 07:11:35 - epoch 11, expected cost 0.0341
INFO [2019-11-19 07:11:36] 2019-11-19 07:11:36 - epoch 12, expected cost 0.0335
INFO [2019-11-19 07:11:37] 2019-11-19 07:11:37 - epoch 13, expected cost 0.0330
INFO [2019-11-19 07:11:38] 2019-11-19 07:11:38 - epoch 14, expected cost 0.0325
INFO [2019-11-19 07:11:40] 2019-11-19 07:11:40 - epoch 15, expected cost 0.0321
INFO [2019-11-19 07:11:41] 2019-11-19 07:11:41 - epoch 16, expected cost 0.0318
INFO [2019-11-19 07:11:42] 2019-11-19 07:11:42 - epoch 17, expected cost 0.0314
INFO [2019-11-19 07:11:43] 2019-11-19 07:11:43 - epoch 18, expected cost 0.0311
INFO [2019-11-19 07:11:43] Success: early stopping. Improvement at iterartion 18 is less then convergence_tol

Explore your own words!

# find words with similar embeddings
get_similar_words("horrible", word_embeddings)
 horrible  terrible     awful       bad    acting 
1.0000000 0.9350510 0.8963729 0.7977457 0.7829555 

Word embeddings for classification

The other principle purpose for word embeddings is to encode text for classification reasons. In this case, we train the word embeddings to take on weights that optimize the classification loss function.

See slides for more discussion of this type of modeling.

Prepare data

To prepare our data we need to convert or labels vector to a tensor:

labels <- as.array(labels)

But more importantly, we need to preprocess our text features. To do so we:

  1. Specify how many words we want to include. This will capture the 10,000 words with the highest usage (frequency).
  2. Create a text_tokenizer object which defines how we want to preprocess the text (i.e. convert to lowercase, remove punctuation, token splitting characters). For the most part, the defaults are sufficient.
  3. Apply the tokenizer to our text with fit_text_tokenizer. This results in an object with many details of our corpus (i.e. word counts, word index).
top_n_words <- 10000

tokenizer <- text_tokenizer(num_words = top_n_words) %>% 
  fit_text_tokenizer(texts)

names(tokenizer)
 [1] "char_level"                   "document_count"               "filters"                      "fit_on_sequences"            
 [5] "fit_on_texts"                 "get_config"                   "index_docs"                   "index_word"                  
 [9] "lower"                        "num_words"                    "oov_token"                    "sequences_to_matrix"         
[13] "sequences_to_texts"           "sequences_to_texts_generator" "split"                        "texts_to_matrix"             
[17] "texts_to_sequences"           "texts_to_sequences_generator" "to_json"                      "word_counts"                 
[21] "word_docs"                    "word_index"                  
total_word_index <- tokenizer$word_index
num_words_used <- tokenizer$num_words

glue("We have now tokenized our reviews. ", "We are considering {num_words_used} ",
     "of {length(total_word_index)} total unique words. The most common words ",
     "include:")
We have now tokenized our reviews. We are considering 10000 of 88582 total unique words. The most common words include:
head(total_word_index)
$the
[1] 1

$and
[1] 2

$a
[1] 3

$of
[1] 4

$to
[1] 5

$is
[1] 6

Next, we extract our vectorized review data as a list. This looks familiar from the earlier modules.

sequences <- texts_to_sequences(tokenizer, texts)

# The vectorized first instance:
sequences[[1]]
  [1]   62    4    3  129   34   44 7576 1414   15    3 4252  514   43   16    3  633  133   12    6    3 1301  459    4 1751  209    3 7693
 [28]  308    6  676   80   32 2137 1110 3008   31    1  929    4   42 5120  469    9 2665 1751    1  223   55   16   54  828 1318  847  228
 [55]    9   40   96  122 1484   57  145   36    1  996  141   27  676  122    1  411   59   94 2278  303  772    5    3  837   20    3 1755
 [82]  646   42  125   71   22  235  101   16   46   49  624   31  702   84  702  378 3493    2 8422   67   27  107 3348

We can see how our tokenizer converted our original text to a cleaned up version:

cat(crayon::blue("Original text:\n"))
Original text:

texts[[1]]
[1] "Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. Even those from the era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader. On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly."
cat(crayon::blue("\nRevised text:\n"))

Revised text:

paste(unlist(tokenizer$index_word)[sequences[[1]]] , collapse = " ")
[1] "story of a man who has unnatural feelings for a pig starts out with a opening scene that is a terrific example of absurd comedy a orchestra audience is turned into an insane violent mob by the crazy of it's singers unfortunately it stays absurd the whole time with no general narrative eventually making it just too off putting even those from the era should be turned off the dialogue would make shakespeare seem easy to a third on a technical level it's better than you might think with some good cinematography by future great future stars sally and forrest can be seen briefly"

Next, since each review is a different length, we need to limit ourselves to a certain number of words so that all our features (reviews) are the same length.

Note (?pad_sequences): * Any reviews that are shorter than this length will be padded. * Any reviews that are longer than this length will be truncated.

max_len <- 150
features <- pad_sequences(sequences, maxlen = max_len)
features[1,]
  [1]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
 [28]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0   62    4    3  129   34   44 7576 1414
 [55]   15    3 4252  514   43   16    3  633  133   12    6    3 1301  459    4 1751  209    3 7693  308    6  676   80   32 2137 1110 3008
 [82]   31    1  929    4   42 5120  469    9 2665 1751    1  223   55   16   54  828 1318  847  228    9   40   96  122 1484   57  145   36
[109]    1  996  141   27  676  122    1  411   59   94 2278  303  772    5    3  837   20    3 1755  646   42  125   71   22  235  101   16
[136]   46   49  624   31  702   84  702  378 3493    2 8422   67   27  107 3348
paste(unlist(tokenizer$index_word)[features[1,]], collapse = " ")
[1] "story of a man who has unnatural feelings for a pig starts out with a opening scene that is a terrific example of absurd comedy a orchestra audience is turned into an insane violent mob by the crazy of it's singers unfortunately it stays absurd the whole time with no general narrative eventually making it just too off putting even those from the era should be turned off the dialogue would make shakespeare seem easy to a third on a technical level it's better than you might think with some good cinematography by future great future stars sally and forrest can be seen briefly"

Your Turn!

Check out different reviews and see how we have transformed the data. Remove eval=FALSE to run.

# use review number (i.e. 2, 10, 150)
which_review <- ____
  
cat(crayon::blue("Original text:\n"))
texts[[which_review ]]

cat(crayon::blue("\nRevised text:\n"))
paste(unlist(tokenizer$index_word)[features[which_review ,]] , collapse = " ")

cat(crayon::blue("\nEncoded text:\n"))
features[which_review ,]

Our data is now preprocessed! We have 25000 observations and 150 features.

dim(features)
[1] 25000   150
dim(labels)
[1] 25000

Model training

To train our model we will use the validation_split procedure within fit. Remember, this takes the last XX% of our data to be used as our validation set. But if you recall, our data was organized in neg and pos folders so we should randomize our data to make sure our validation set doesn’t end up being all positive or negative reviews!

set.seed(123)
index <- sample(1:nrow(features))

x_train <- features[index, ]
y_train <- labels[index]

To create our network architecture that includes word embeddings, need to include two things:

  1. layer_embedding layer that creates the embeddings,
  2. layer_flatten to flatten our embeddings to a 2D tensor for our densely connected portion of our model
model <- keras_model_sequential() %>%
  layer_embedding(
    input_dim = top_n_words,  # number of words we are considering
    input_length = max_len,   # length that we have set each review to
    output_dim = 32            # length of our word embeddings
    ) %>%  
  layer_flatten() %>% 
  layer_dense(units = 1, activation = "sigmoid")
2019-11-19 07:12:02.951758: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-19 07:12:03.030637: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fcb86734d60 executing computations on platform Host. Devices:
2019-11-19 07:12:03.030655: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
summary(model)
Model: "sequential"
____________________________________________________________________________________________________________________________________________
Layer (type)                                                   Output Shape                                            Param #              
============================================================================================================================================
embedding (Embedding)                                          (None, 150, 32)                                         320000               
____________________________________________________________________________________________________________________________________________
flatten (Flatten)                                              (None, 4800)                                            0                    
____________________________________________________________________________________________________________________________________________
dense (Dense)                                                  (None, 1)                                               4801                 
============================================================================================================================================
Total params: 324,801
Trainable params: 324,801
Non-trainable params: 0
____________________________________________________________________________________________________________________________________________

The rest of our modeling procedures follows the same protocols that you’ve seen in the other modules.

model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)

history <- model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.2
)
Train on 20000 samples, validate on 5000 samples
Epoch 1/10

   32/20000 [..............................] - ETA: 5:44 - loss: 0.6857 - acc: 0.5938
  576/20000 [..............................] - ETA: 20s - loss: 0.6925 - acc: 0.5035 
 1056/20000 [>.............................] - ETA: 11s - loss: 0.6928 - acc: 0.5076
 1568/20000 [=>............................] - ETA: 8s - loss: 0.6929 - acc: 0.5077 
 2048/20000 [==>...........................] - ETA: 6s - loss: 0.6929 - acc: 0.5088
 2528/20000 [==>...........................] - ETA: 5s - loss: 0.6921 - acc: 0.5178
 3008/20000 [===>..........................] - ETA: 4s - loss: 0.6906 - acc: 0.5392
 3520/20000 [====>.........................] - ETA: 4s - loss: 0.6885 - acc: 0.5568
 4096/20000 [=====>........................] - ETA: 3s - loss: 0.6870 - acc: 0.5671
 4640/20000 [=====>........................] - ETA: 3s - loss: 0.6840 - acc: 0.5789
 5120/20000 [======>.......................] - ETA: 3s - loss: 0.6817 - acc: 0.5865
 5632/20000 [=======>......................] - ETA: 2s - loss: 0.6785 - acc: 0.5975
 6144/20000 [========>.....................] - ETA: 2s - loss: 0.6746 - acc: 0.6076
 6624/20000 [========>.....................] - ETA: 2s - loss: 0.6683 - acc: 0.6221
 7104/20000 [=========>....................] - ETA: 2s - loss: 0.6633 - acc: 0.6299
 7584/20000 [==========>...................] - ETA: 2s - loss: 0.6577 - acc: 0.6374
 8064/20000 [===========>..................] - ETA: 2s - loss: 0.6520 - acc: 0.6452
 8544/20000 [===========>..................] - ETA: 1s - loss: 0.6446 - acc: 0.6541
 9056/20000 [============>.................] - ETA: 1s - loss: 0.6371 - acc: 0.6618
 9536/20000 [=============>................] - ETA: 1s - loss: 0.6297 - acc: 0.6689
10016/20000 [==============>...............] - ETA: 1s - loss: 0.6217 - acc: 0.6763
10528/20000 [==============>...............] - ETA: 1s - loss: 0.6138 - acc: 0.6828
11008/20000 [===============>..............] - ETA: 1s - loss: 0.6074 - acc: 0.6876
11520/20000 [================>.............] - ETA: 1s - loss: 0.5999 - acc: 0.6940
12032/20000 [=================>............] - ETA: 1s - loss: 0.5918 - acc: 0.7000
12512/20000 [=================>............] - ETA: 1s - loss: 0.5849 - acc: 0.7051
13024/20000 [==================>...........] - ETA: 1s - loss: 0.5782 - acc: 0.7105
13536/20000 [===================>..........] - ETA: 0s - loss: 0.5714 - acc: 0.7150
14048/20000 [====================>.........] - ETA: 0s - loss: 0.5654 - acc: 0.7193
14560/20000 [====================>.........] - ETA: 0s - loss: 0.5590 - acc: 0.7236
15072/20000 [=====================>........] - ETA: 0s - loss: 0.5520 - acc: 0.7278
15584/20000 [======================>.......] - ETA: 0s - loss: 0.5462 - acc: 0.7315
16096/20000 [=======================>......] - ETA: 0s - loss: 0.5399 - acc: 0.7348
16576/20000 [=======================>......] - ETA: 0s - loss: 0.5355 - acc: 0.7372
17056/20000 [========================>.....] - ETA: 0s - loss: 0.5304 - acc: 0.7403
17536/20000 [=========================>....] - ETA: 0s - loss: 0.5256 - acc: 0.7431
17888/20000 [=========================>....] - ETA: 0s - loss: 0.5221 - acc: 0.7455
18336/20000 [==========================>...] - ETA: 0s - loss: 0.5164 - acc: 0.7491
18784/20000 [===========================>..] - ETA: 0s - loss: 0.5115 - acc: 0.7517
19232/20000 [===========================>..] - ETA: 0s - loss: 0.5069 - acc: 0.7544
19712/20000 [============================>.] - ETA: 0s - loss: 0.5020 - acc: 0.7577
20000/20000 [==============================] - 3s 161us/sample - loss: 0.4990 - acc: 0.7595 - val_loss: 0.3342 - val_acc: 0.8548
Epoch 2/10

   32/20000 [..............................] - ETA: 2s - loss: 0.2715 - acc: 0.9375
  544/20000 [..............................] - ETA: 1s - loss: 0.2684 - acc: 0.8860
 1088/20000 [>.............................] - ETA: 1s - loss: 0.2631 - acc: 0.8888
 1632/20000 [=>............................] - ETA: 1s - loss: 0.2597 - acc: 0.8952
 2176/20000 [==>...........................] - ETA: 1s - loss: 0.2659 - acc: 0.8957
 2688/20000 [===>..........................] - ETA: 1s - loss: 0.2655 - acc: 0.8951
 3232/20000 [===>..........................] - ETA: 1s - loss: 0.2634 - acc: 0.8967
 3744/20000 [====>.........................] - ETA: 1s - loss: 0.2660 - acc: 0.8969
 4224/20000 [=====>........................] - ETA: 1s - loss: 0.2649 - acc: 0.8954
 4704/20000 [======>.......................] - ETA: 1s - loss: 0.2675 - acc: 0.8948
 5216/20000 [======>.......................] - ETA: 1s - loss: 0.2679 - acc: 0.8949
 5632/20000 [=======>......................] - ETA: 1s - loss: 0.2660 - acc: 0.8956
 5984/20000 [=======>......................] - ETA: 1s - loss: 0.2654 - acc: 0.8956
 6464/20000 [========>.....................] - ETA: 1s - loss: 0.2644 - acc: 0.8959
 6912/20000 [=========>....................] - ETA: 1s - loss: 0.2650 - acc: 0.8951
 7392/20000 [==========>...................] - ETA: 1s - loss: 0.2645 - acc: 0.8957
 7840/20000 [==========>...................] - ETA: 1s - loss: 0.2637 - acc: 0.8963
 8320/20000 [===========>..................] - ETA: 1s - loss: 0.2604 - acc: 0.8975
 8768/20000 [============>.................] - ETA: 1s - loss: 0.2607 - acc: 0.8974
 9216/20000 [============>.................] - ETA: 1s - loss: 0.2604 - acc: 0.8981
 9664/20000 [=============>................] - ETA: 1s - loss: 0.2615 - acc: 0.8976
10112/20000 [==============>...............] - ETA: 1s - loss: 0.2616 - acc: 0.8976
10560/20000 [==============>...............] - ETA: 1s - loss: 0.2618 - acc: 0.8974
11040/20000 [===============>..............] - ETA: 0s - loss: 0.2632 - acc: 0.8966
11232/20000 [===============>..............] - ETA: 0s - loss: 0.2628 - acc: 0.8965
11712/20000 [================>.............] - ETA: 0s - loss: 0.2633 - acc: 0.8963
12192/20000 [=================>............] - ETA: 0s - loss: 0.2615 - acc: 0.8971
12672/20000 [==================>...........] - ETA: 0s - loss: 0.2614 - acc: 0.8965
13152/20000 [==================>...........] - ETA: 0s - loss: 0.2600 - acc: 0.8973
13632/20000 [===================>..........] - ETA: 0s - loss: 0.2603 - acc: 0.8976
14144/20000 [====================>.........] - ETA: 0s - loss: 0.2597 - acc: 0.8980
14656/20000 [====================>.........] - ETA: 0s - loss: 0.2591 - acc: 0.8980
15168/20000 [=====================>........] - ETA: 0s - loss: 0.2581 - acc: 0.8982
15680/20000 [======================>.......] - ETA: 0s - loss: 0.2569 - acc: 0.8985
16192/20000 [=======================>......] - ETA: 0s - loss: 0.2565 - acc: 0.8988
16704/20000 [========================>.....] - ETA: 0s - loss: 0.2554 - acc: 0.8992
17184/20000 [========================>.....] - ETA: 0s - loss: 0.2545 - acc: 0.8998
17696/20000 [=========================>....] - ETA: 0s - loss: 0.2545 - acc: 0.9002
18208/20000 [==========================>...] - ETA: 0s - loss: 0.2534 - acc: 0.9006
18720/20000 [===========================>..] - ETA: 0s - loss: 0.2534 - acc: 0.9006
19232/20000 [===========================>..] - ETA: 0s - loss: 0.2539 - acc: 0.9000
19744/20000 [============================>.] - ETA: 0s - loss: 0.2541 - acc: 0.8999
20000/20000 [==============================] - 3s 127us/sample - loss: 0.2541 - acc: 0.8997 - val_loss: 0.3012 - val_acc: 0.8738
Epoch 3/10

   32/20000 [..............................] - ETA: 3s - loss: 0.1401 - acc: 0.9375
  512/20000 [..............................] - ETA: 2s - loss: 0.1756 - acc: 0.9375
  992/20000 [>.............................] - ETA: 2s - loss: 0.1936 - acc: 0.9294
 1440/20000 [=>............................] - ETA: 2s - loss: 0.2039 - acc: 0.9215
 1856/20000 [=>............................] - ETA: 2s - loss: 0.2045 - acc: 0.9224
 2304/20000 [==>...........................] - ETA: 2s - loss: 0.1910 - acc: 0.9288
 2752/20000 [===>..........................] - ETA: 1s - loss: 0.1904 - acc: 0.9291
 3232/20000 [===>..........................] - ETA: 1s - loss: 0.1841 - acc: 0.9322
 3488/20000 [====>.........................] - ETA: 1s - loss: 0.1837 - acc: 0.9321
 3904/20000 [====>.........................] - ETA: 1s - loss: 0.1892 - acc: 0.9293
 4352/20000 [=====>........................] - ETA: 1s - loss: 0.1890 - acc: 0.9301
 4800/20000 [======>.......................] - ETA: 1s - loss: 0.1859 - acc: 0.9319
 5280/20000 [======>.......................] - ETA: 1s - loss: 0.1906 - acc: 0.9301
 5760/20000 [=======>......................] - ETA: 1s - loss: 0.1889 - acc: 0.9314
 6208/20000 [========>.....................] - ETA: 1s - loss: 0.1904 - acc: 0.9309
 6656/20000 [========>.....................] - ETA: 1s - loss: 0.1927 - acc: 0.9292
 7104/20000 [=========>....................] - ETA: 1s - loss: 0.1904 - acc: 0.9306
 7552/20000 [==========>...................] - ETA: 1s - loss: 0.1914 - acc: 0.9301
 8032/20000 [===========>..................] - ETA: 1s - loss: 0.1941 - acc: 0.9288
 8480/20000 [===========>..................] - ETA: 1s - loss: 0.1945 - acc: 0.9283
 8960/20000 [============>.................] - ETA: 1s - loss: 0.1960 - acc: 0.9270
 9408/20000 [=============>................] - ETA: 1s - loss: 0.1971 - acc: 0.9264
 9888/20000 [=============>................] - ETA: 1s - loss: 0.1968 - acc: 0.9267
10336/20000 [==============>...............] - ETA: 1s - loss: 0.1949 - acc: 0.9271
10816/20000 [===============>..............] - ETA: 1s - loss: 0.1955 - acc: 0.9269
11264/20000 [===============>..............] - ETA: 1s - loss: 0.1947 - acc: 0.9273
11712/20000 [================>.............] - ETA: 0s - loss: 0.1941 - acc: 0.9275
12160/20000 [=================>............] - ETA: 0s - loss: 0.1939 - acc: 0.9275
12576/20000 [=================>............] - ETA: 0s - loss: 0.1939 - acc: 0.9274
12992/20000 [==================>...........] - ETA: 0s - loss: 0.1934 - acc: 0.9275
13376/20000 [===================>..........] - ETA: 0s - loss: 0.1933 - acc: 0.9272
13824/20000 [===================>..........] - ETA: 0s - loss: 0.1933 - acc: 0.9272
14304/20000 [====================>.........] - ETA: 0s - loss: 0.1932 - acc: 0.9270
14784/20000 [=====================>........] - ETA: 0s - loss: 0.1942 - acc: 0.9267
15232/20000 [=====================>........] - ETA: 0s - loss: 0.1935 - acc: 0.9271
15712/20000 [======================>.......] - ETA: 0s - loss: 0.1941 - acc: 0.9270
16160/20000 [=======================>......] - ETA: 0s - loss: 0.1938 - acc: 0.9272
16608/20000 [=======================>......] - ETA: 0s - loss: 0.1932 - acc: 0.9273
17056/20000 [========================>.....] - ETA: 0s - loss: 0.1929 - acc: 0.9275
17504/20000 [=========================>....] - ETA: 0s - loss: 0.1928 - acc: 0.9273
17984/20000 [=========================>....] - ETA: 0s - loss: 0.1926 - acc: 0.9275
18464/20000 [==========================>...] - ETA: 0s - loss: 0.1926 - acc: 0.9275
18944/20000 [===========================>..] - ETA: 0s - loss: 0.1920 - acc: 0.9278
19456/20000 [============================>.] - ETA: 0s - loss: 0.1913 - acc: 0.9282
19968/20000 [============================>.] - ETA: 0s - loss: 0.1910 - acc: 0.9285
20000/20000 [==============================] - 3s 135us/sample - loss: 0.1908 - acc: 0.9286 - val_loss: 0.3053 - val_acc: 0.8788
Epoch 4/10

   32/20000 [..............................] - ETA: 3s - loss: 0.1217 - acc: 1.0000
  512/20000 [..............................] - ETA: 2s - loss: 0.1604 - acc: 0.9512
 1024/20000 [>.............................] - ETA: 2s - loss: 0.1589 - acc: 0.9443
 1536/20000 [=>............................] - ETA: 1s - loss: 0.1483 - acc: 0.9486
 1984/20000 [=>............................] - ETA: 1s - loss: 0.1458 - acc: 0.9491
 2432/20000 [==>...........................] - ETA: 1s - loss: 0.1428 - acc: 0.9515
 2912/20000 [===>..........................] - ETA: 1s - loss: 0.1431 - acc: 0.9509
 3360/20000 [====>.........................] - ETA: 1s - loss: 0.1430 - acc: 0.9506
 3840/20000 [====>.........................] - ETA: 1s - loss: 0.1418 - acc: 0.9518
 4320/20000 [=====>........................] - ETA: 1s - loss: 0.1432 - acc: 0.9512
 4832/20000 [======>.......................] - ETA: 1s - loss: 0.1459 - acc: 0.9497
 5344/20000 [=======>......................] - ETA: 1s - loss: 0.1437 - acc: 0.9504
 5856/20000 [=======>......................] - ETA: 1s - loss: 0.1453 - acc: 0.9493
 6336/20000 [========>.....................] - ETA: 1s - loss: 0.1441 - acc: 0.9492
 6816/20000 [=========>....................] - ETA: 1s - loss: 0.1444 - acc: 0.9498
 7296/20000 [=========>....................] - ETA: 1s - loss: 0.1444 - acc: 0.9492
 7776/20000 [==========>...................] - ETA: 1s - loss: 0.1452 - acc: 0.9479
 8288/20000 [===========>..................] - ETA: 1s - loss: 0.1448 - acc: 0.9475
 8736/20000 [============>.................] - ETA: 1s - loss: 0.1452 - acc: 0.9475
 9152/20000 [============>.................] - ETA: 1s - loss: 0.1449 - acc: 0.9479
 9568/20000 [=============>................] - ETA: 1s - loss: 0.1447 - acc: 0.9475
10016/20000 [==============>...............] - ETA: 1s - loss: 0.1441 - acc: 0.9477
10464/20000 [==============>...............] - ETA: 1s - loss: 0.1441 - acc: 0.9477
10912/20000 [===============>..............] - ETA: 0s - loss: 0.1452 - acc: 0.9477
11392/20000 [================>.............] - ETA: 0s - loss: 0.1447 - acc: 0.9478
11840/20000 [================>.............] - ETA: 0s - loss: 0.1441 - acc: 0.9480
12352/20000 [=================>............] - ETA: 0s - loss: 0.1443 - acc: 0.9478
12832/20000 [==================>...........] - ETA: 0s - loss: 0.1427 - acc: 0.9487
13312/20000 [==================>...........] - ETA: 0s - loss: 0.1433 - acc: 0.9484
13792/20000 [===================>..........] - ETA: 0s - loss: 0.1445 - acc: 0.9479
14304/20000 [====================>.........] - ETA: 0s - loss: 0.1446 - acc: 0.9477
14816/20000 [=====================>........] - ETA: 0s - loss: 0.1447 - acc: 0.9475
15360/20000 [======================>.......] - ETA: 0s - loss: 0.1436 - acc: 0.9479
15616/20000 [======================>.......] - ETA: 0s - loss: 0.1441 - acc: 0.9478
16128/20000 [=======================>......] - ETA: 0s - loss: 0.1438 - acc: 0.9479
16672/20000 [========================>.....] - ETA: 0s - loss: 0.1433 - acc: 0.9486
17216/20000 [========================>.....] - ETA: 0s - loss: 0.1440 - acc: 0.9484
17760/20000 [=========================>....] - ETA: 0s - loss: 0.1440 - acc: 0.9483
18304/20000 [==========================>...] - ETA: 0s - loss: 0.1437 - acc: 0.9484
18848/20000 [===========================>..] - ETA: 0s - loss: 0.1440 - acc: 0.9483
19392/20000 [============================>.] - ETA: 0s - loss: 0.1433 - acc: 0.9483
19840/20000 [============================>.] - ETA: 0s - loss: 0.1438 - acc: 0.9483
20000/20000 [==============================] - 3s 126us/sample - loss: 0.1443 - acc: 0.9480 - val_loss: 0.3178 - val_acc: 0.8718
Epoch 5/10

   32/20000 [..............................] - ETA: 3s - loss: 0.0773 - acc: 1.0000
  544/20000 [..............................] - ETA: 2s - loss: 0.0935 - acc: 0.9779
 1024/20000 [>.............................] - ETA: 2s - loss: 0.0984 - acc: 0.9736
 1536/20000 [=>............................] - ETA: 1s - loss: 0.0952 - acc: 0.9740
 2016/20000 [==>...........................] - ETA: 1s - loss: 0.0976 - acc: 0.9712
 2496/20000 [==>...........................] - ETA: 1s - loss: 0.0963 - acc: 0.9708
 3008/20000 [===>..........................] - ETA: 1s - loss: 0.0968 - acc: 0.9707
 3520/20000 [====>.........................] - ETA: 1s - loss: 0.0969 - acc: 0.9705
 4032/20000 [=====>........................] - ETA: 1s - loss: 0.0992 - acc: 0.9700
 4544/20000 [=====>........................] - ETA: 1s - loss: 0.0991 - acc: 0.9703
 5024/20000 [======>.......................] - ETA: 1s - loss: 0.0989 - acc: 0.9697
 5504/20000 [=======>......................] - ETA: 1s - loss: 0.1002 - acc: 0.9695
 6016/20000 [========>.....................] - ETA: 1s - loss: 0.1001 - acc: 0.9691
 6528/20000 [========>.....................] - ETA: 1s - loss: 0.0997 - acc: 0.9695
 7008/20000 [=========>....................] - ETA: 1s - loss: 0.1007 - acc: 0.9689
 7488/20000 [==========>...................] - ETA: 1s - loss: 0.1004 - acc: 0.9682
 8000/20000 [===========>..................] - ETA: 1s - loss: 0.0990 - acc: 0.9691
 8512/20000 [===========>..................] - ETA: 1s - loss: 0.0979 - acc: 0.9695
 8736/20000 [============>.................] - ETA: 1s - loss: 0.0986 - acc: 0.9691
 9184/20000 [============>.................] - ETA: 1s - loss: 0.0999 - acc: 0.9690
 9664/20000 [=============>................] - ETA: 1s - loss: 0.1001 - acc: 0.9690
10176/20000 [==============>...............] - ETA: 1s - loss: 0.1003 - acc: 0.9688
10688/20000 [===============>..............] - ETA: 0s - loss: 0.1013 - acc: 0.9685
11200/20000 [===============>..............] - ETA: 0s - loss: 0.1016 - acc: 0.9685
11680/20000 [================>.............] - ETA: 0s - loss: 0.1014 - acc: 0.9688
12160/20000 [=================>............] - ETA: 0s - loss: 0.1006 - acc: 0.9691
12672/20000 [==================>...........] - ETA: 0s - loss: 0.1002 - acc: 0.9693
13152/20000 [==================>...........] - ETA: 0s - loss: 0.0998 - acc: 0.9693
13632/20000 [===================>..........] - ETA: 0s - loss: 0.1006 - acc: 0.9689
14144/20000 [====================>.........] - ETA: 0s - loss: 0.1005 - acc: 0.9686
14656/20000 [====================>.........] - ETA: 0s - loss: 0.1008 - acc: 0.9684
15168/20000 [=====================>........] - ETA: 0s - loss: 0.1003 - acc: 0.9688
15680/20000 [======================>.......] - ETA: 0s - loss: 0.1010 - acc: 0.9683
16192/20000 [=======================>......] - ETA: 0s - loss: 0.1011 - acc: 0.9682
16640/20000 [=======================>......] - ETA: 0s - loss: 0.1012 - acc: 0.9681
17120/20000 [========================>.....] - ETA: 0s - loss: 0.1019 - acc: 0.9678
17568/20000 [=========================>....] - ETA: 0s - loss: 0.1021 - acc: 0.9674
18016/20000 [==========================>...] - ETA: 0s - loss: 0.1028 - acc: 0.9673
18400/20000 [==========================>...] - ETA: 0s - loss: 0.1026 - acc: 0.9674
18816/20000 [===========================>..] - ETA: 0s - loss: 0.1024 - acc: 0.9676
19264/20000 [===========================>..] - ETA: 0s - loss: 0.1025 - acc: 0.9676
19680/20000 [============================>.] - ETA: 0s - loss: 0.1023 - acc: 0.9677
20000/20000 [==============================] - 3s 128us/sample - loss: 0.1021 - acc: 0.9677 - val_loss: 0.3374 - val_acc: 0.8696
Epoch 6/10

   32/20000 [..............................] - ETA: 2s - loss: 0.0381 - acc: 1.0000
  544/20000 [..............................] - ETA: 2s - loss: 0.0559 - acc: 0.9890
 1024/20000 [>.............................] - ETA: 1s - loss: 0.0635 - acc: 0.9824
 1536/20000 [=>............................] - ETA: 1s - loss: 0.0672 - acc: 0.9824
 1920/20000 [=>............................] - ETA: 2s - loss: 0.0638 - acc: 0.9839
 2368/20000 [==>...........................] - ETA: 2s - loss: 0.0618 - acc: 0.9844
 2880/20000 [===>..........................] - ETA: 1s - loss: 0.0633 - acc: 0.9837
 3360/20000 [====>.........................] - ETA: 1s - loss: 0.0629 - acc: 0.9836
 3872/20000 [====>.........................] - ETA: 1s - loss: 0.0617 - acc: 0.9837
 4384/20000 [=====>........................] - ETA: 1s - loss: 0.0618 - acc: 0.9845
 4864/20000 [======>.......................] - ETA: 1s - loss: 0.0613 - acc: 0.9846
 5344/20000 [=======>......................] - ETA: 1s - loss: 0.0615 - acc: 0.9848
 5856/20000 [=======>......................] - ETA: 1s - loss: 0.0619 - acc: 0.9846
 6368/20000 [========>.....................] - ETA: 1s - loss: 0.0622 - acc: 0.9838
 6880/20000 [=========>....................] - ETA: 1s - loss: 0.0619 - acc: 0.9842
 7392/20000 [==========>...................] - ETA: 1s - loss: 0.0612 - acc: 0.9843
 7904/20000 [==========>...................] - ETA: 1s - loss: 0.0615 - acc: 0.9836
 8384/20000 [===========>..................] - ETA: 1s - loss: 0.0614 - acc: 0.9834
 8864/20000 [============>.................] - ETA: 1s - loss: 0.0628 - acc: 0.9825
 9376/20000 [=============>................] - ETA: 1s - loss: 0.0633 - acc: 0.9824
 9888/20000 [=============>................] - ETA: 1s - loss: 0.0643 - acc: 0.9820
10400/20000 [==============>...............] - ETA: 1s - loss: 0.0646 - acc: 0.9819
10944/20000 [===============>..............] - ETA: 0s - loss: 0.0641 - acc: 0.9819
11456/20000 [================>.............] - ETA: 0s - loss: 0.0646 - acc: 0.9816
11968/20000 [================>.............] - ETA: 0s - loss: 0.0644 - acc: 0.9815
12480/20000 [=================>............] - ETA: 0s - loss: 0.0650 - acc: 0.9812
12928/20000 [==================>...........] - ETA: 0s - loss: 0.0652 - acc: 0.9812
13408/20000 [===================>..........] - ETA: 0s - loss: 0.0654 - acc: 0.9811
13888/20000 [===================>..........] - ETA: 0s - loss: 0.0658 - acc: 0.9811
14368/20000 [====================>.........] - ETA: 0s - loss: 0.0657 - acc: 0.9810
14848/20000 [=====================>........] - ETA: 0s - loss: 0.0656 - acc: 0.9811
15360/20000 [======================>.......] - ETA: 0s - loss: 0.0658 - acc: 0.9812
15872/20000 [======================>.......] - ETA: 0s - loss: 0.0654 - acc: 0.9814
16384/20000 [=======================>......] - ETA: 0s - loss: 0.0655 - acc: 0.9815
16864/20000 [========================>.....] - ETA: 0s - loss: 0.0652 - acc: 0.9815
17376/20000 [=========================>....] - ETA: 0s - loss: 0.0662 - acc: 0.9813
17888/20000 [=========================>....] - ETA: 0s - loss: 0.0663 - acc: 0.9812
18368/20000 [==========================>...] - ETA: 0s - loss: 0.0661 - acc: 0.9814
18848/20000 [===========================>..] - ETA: 0s - loss: 0.0659 - acc: 0.9815
19328/20000 [===========================>..] - ETA: 0s - loss: 0.0661 - acc: 0.9814
19840/20000 [============================>.] - ETA: 0s - loss: 0.0660 - acc: 0.9815
20000/20000 [==============================] - 3s 125us/sample - loss: 0.0660 - acc: 0.9815 - val_loss: 0.3562 - val_acc: 0.8656
Epoch 7/10

   32/20000 [..............................] - ETA: 2s - loss: 0.2297 - acc: 0.9062
  512/20000 [..............................] - ETA: 2s - loss: 0.0395 - acc: 0.9922
 1024/20000 [>.............................] - ETA: 1s - loss: 0.0356 - acc: 0.9941
 1504/20000 [=>............................] - ETA: 1s - loss: 0.0372 - acc: 0.9934
 1888/20000 [=>............................] - ETA: 2s - loss: 0.0379 - acc: 0.9931
 2368/20000 [==>...........................] - ETA: 1s - loss: 0.0374 - acc: 0.9928
 2752/20000 [===>..........................] - ETA: 1s - loss: 0.0392 - acc: 0.9924
 3232/20000 [===>..........................] - ETA: 1s - loss: 0.0393 - acc: 0.9920
 3776/20000 [====>.........................] - ETA: 1s - loss: 0.0379 - acc: 0.9926
 4256/20000 [=====>........................] - ETA: 1s - loss: 0.0375 - acc: 0.9930
 4800/20000 [======>.......................] - ETA: 1s - loss: 0.0385 - acc: 0.9925
 5312/20000 [======>.......................] - ETA: 1s - loss: 0.0377 - acc: 0.9925
 5856/20000 [=======>......................] - ETA: 1s - loss: 0.0372 - acc: 0.9925
 6368/20000 [========>.....................] - ETA: 1s - loss: 0.0366 - acc: 0.9928
 6816/20000 [=========>....................] - ETA: 1s - loss: 0.0370 - acc: 0.9925
 7264/20000 [=========>....................] - ETA: 1s - loss: 0.0375 - acc: 0.9922
 7744/20000 [==========>...................] - ETA: 1s - loss: 0.0377 - acc: 0.9917
 8192/20000 [===========>..................] - ETA: 1s - loss: 0.0379 - acc: 0.9918
 8640/20000 [===========>..................] - ETA: 1s - loss: 0.0371 - acc: 0.9921
 9088/20000 [============>.................] - ETA: 1s - loss: 0.0381 - acc: 0.9917
 9536/20000 [=============>................] - ETA: 1s - loss: 0.0377 - acc: 0.9918
 9984/20000 [=============>................] - ETA: 1s - loss: 0.0375 - acc: 0.9918
10432/20000 [==============>...............] - ETA: 1s - loss: 0.0386 - acc: 0.9919
10880/20000 [===============>..............] - ETA: 0s - loss: 0.0393 - acc: 0.9915
11360/20000 [================>.............] - ETA: 0s - loss: 0.0389 - acc: 0.9916
11840/20000 [================>.............] - ETA: 0s - loss: 0.0387 - acc: 0.9916
12320/20000 [=================>............] - ETA: 0s - loss: 0.0393 - acc: 0.9913
12800/20000 [==================>...........] - ETA: 0s - loss: 0.0393 - acc: 0.9913
13280/20000 [==================>...........] - ETA: 0s - loss: 0.0389 - acc: 0.9916
13568/20000 [===================>..........] - ETA: 0s - loss: 0.0388 - acc: 0.9916
13984/20000 [===================>..........] - ETA: 0s - loss: 0.0386 - acc: 0.9917
14464/20000 [====================>.........] - ETA: 0s - loss: 0.0387 - acc: 0.9914
14944/20000 [=====================>........] - ETA: 0s - loss: 0.0383 - acc: 0.9916
15456/20000 [======================>.......] - ETA: 0s - loss: 0.0381 - acc: 0.9917
15968/20000 [======================>.......] - ETA: 0s - loss: 0.0383 - acc: 0.9917
16448/20000 [=======================>......] - ETA: 0s - loss: 0.0387 - acc: 0.9913
16960/20000 [========================>.....] - ETA: 0s - loss: 0.0386 - acc: 0.9912
17472/20000 [=========================>....] - ETA: 0s - loss: 0.0385 - acc: 0.9911
17984/20000 [=========================>....] - ETA: 0s - loss: 0.0385 - acc: 0.9912
18496/20000 [==========================>...] - ETA: 0s - loss: 0.0383 - acc: 0.9912
19008/20000 [===========================>..] - ETA: 0s - loss: 0.0385 - acc: 0.9911
19552/20000 [============================>.] - ETA: 0s - loss: 0.0385 - acc: 0.9912
20000/20000 [==============================] - 3s 127us/sample - loss: 0.0383 - acc: 0.9912 - val_loss: 0.3884 - val_acc: 0.8624
Epoch 8/10

   32/20000 [..............................] - ETA: 3s - loss: 0.0330 - acc: 0.9688
  512/20000 [..............................] - ETA: 2s - loss: 0.0185 - acc: 0.9980
  992/20000 [>.............................] - ETA: 2s - loss: 0.0185 - acc: 0.9970
 1472/20000 [=>............................] - ETA: 2s - loss: 0.0171 - acc: 0.9980
 1984/20000 [=>............................] - ETA: 1s - loss: 0.0162 - acc: 0.9985
 2496/20000 [==>...........................] - ETA: 1s - loss: 0.0150 - acc: 0.9988
 2976/20000 [===>..........................] - ETA: 1s - loss: 0.0178 - acc: 0.9976
 3488/20000 [====>.........................] - ETA: 1s - loss: 0.0170 - acc: 0.9977
 4000/20000 [=====>........................] - ETA: 1s - loss: 0.0171 - acc: 0.9975
 4480/20000 [=====>........................] - ETA: 1s - loss: 0.0171 - acc: 0.9973
 4960/20000 [======>.......................] - ETA: 1s - loss: 0.0201 - acc: 0.9968
 5440/20000 [=======>......................] - ETA: 1s - loss: 0.0200 - acc: 0.9967
 5920/20000 [=======>......................] - ETA: 1s - loss: 0.0202 - acc: 0.9965
 6400/20000 [========>.....................] - ETA: 1s - loss: 0.0205 - acc: 0.9964
 6688/20000 [=========>....................] - ETA: 1s - loss: 0.0203 - acc: 0.9966
 7200/20000 [=========>....................] - ETA: 1s - loss: 0.0206 - acc: 0.9967
 7680/20000 [==========>...................] - ETA: 1s - loss: 0.0208 - acc: 0.9964
 8160/20000 [===========>..................] - ETA: 1s - loss: 0.0210 - acc: 0.9964
 8672/20000 [============>.................] - ETA: 1s - loss: 0.0207 - acc: 0.9964
 9216/20000 [============>.................] - ETA: 1s - loss: 0.0209 - acc: 0.9961
 9760/20000 [=============>................] - ETA: 1s - loss: 0.0206 - acc: 0.9960
10304/20000 [==============>...............] - ETA: 1s - loss: 0.0207 - acc: 0.9961
10880/20000 [===============>..............] - ETA: 0s - loss: 0.0209 - acc: 0.9960
11424/20000 [================>.............] - ETA: 0s - loss: 0.0207 - acc: 0.9961
11904/20000 [================>.............] - ETA: 0s - loss: 0.0205 - acc: 0.9961
12384/20000 [=================>............] - ETA: 0s - loss: 0.0206 - acc: 0.9960
12896/20000 [==================>...........] - ETA: 0s - loss: 0.0204 - acc: 0.9961
13376/20000 [===================>..........] - ETA: 0s - loss: 0.0206 - acc: 0.9961
13888/20000 [===================>..........] - ETA: 0s - loss: 0.0205 - acc: 0.9962
14400/20000 [====================>.........] - ETA: 0s - loss: 0.0207 - acc: 0.9960
14912/20000 [=====================>........] - ETA: 0s - loss: 0.0208 - acc: 0.9958
15424/20000 [======================>.......] - ETA: 0s - loss: 0.0206 - acc: 0.9959
15936/20000 [======================>.......] - ETA: 0s - loss: 0.0206 - acc: 0.9959
16416/20000 [=======================>......] - ETA: 0s - loss: 0.0204 - acc: 0.9960
16896/20000 [========================>.....] - ETA: 0s - loss: 0.0207 - acc: 0.9958
17408/20000 [=========================>....] - ETA: 0s - loss: 0.0207 - acc: 0.9958
17888/20000 [=========================>....] - ETA: 0s - loss: 0.0206 - acc: 0.9959
18400/20000 [==========================>...] - ETA: 0s - loss: 0.0204 - acc: 0.9960
18912/20000 [===========================>..] - ETA: 0s - loss: 0.0204 - acc: 0.9960
19424/20000 [============================>.] - ETA: 0s - loss: 0.0204 - acc: 0.9960
19936/20000 [============================>.] - ETA: 0s - loss: 0.0204 - acc: 0.9960
20000/20000 [==============================] - 3s 125us/sample - loss: 0.0204 - acc: 0.9960 - val_loss: 0.4210 - val_acc: 0.8620
Epoch 9/10

   32/20000 [..............................] - ETA: 2s - loss: 0.0045 - acc: 1.0000
  576/20000 [..............................] - ETA: 1s - loss: 0.0074 - acc: 1.0000
 1088/20000 [>.............................] - ETA: 1s - loss: 0.0115 - acc: 0.9982
 1536/20000 [=>............................] - ETA: 1s - loss: 0.0109 - acc: 0.9980
 1984/20000 [=>............................] - ETA: 1s - loss: 0.0100 - acc: 0.9980
 2496/20000 [==>...........................] - ETA: 1s - loss: 0.0093 - acc: 0.9984
 3008/20000 [===>..........................] - ETA: 1s - loss: 0.0093 - acc: 0.9983
 3520/20000 [====>.........................] - ETA: 1s - loss: 0.0097 - acc: 0.9986
 4000/20000 [=====>........................] - ETA: 1s - loss: 0.0093 - acc: 0.9987
 4480/20000 [=====>........................] - ETA: 1s - loss: 0.0103 - acc: 0.9984
 4992/20000 [======>.......................] - ETA: 1s - loss: 0.0102 - acc: 0.9984
 5472/20000 [=======>......................] - ETA: 1s - loss: 0.0098 - acc: 0.9985
 5984/20000 [=======>......................] - ETA: 1s - loss: 0.0102 - acc: 0.9983
 6464/20000 [========>.....................] - ETA: 1s - loss: 0.0102 - acc: 0.9983
 6976/20000 [=========>....................] - ETA: 1s - loss: 0.0100 - acc: 0.9984
 7456/20000 [==========>...................] - ETA: 1s - loss: 0.0101 - acc: 0.9984
 7936/20000 [==========>...................] - ETA: 1s - loss: 0.0099 - acc: 0.9985
 8416/20000 [===========>..................] - ETA: 1s - loss: 0.0098 - acc: 0.9985
 8896/20000 [============>.................] - ETA: 1s - loss: 0.0100 - acc: 0.9984
 9408/20000 [=============>................] - ETA: 1s - loss: 0.0098 - acc: 0.9985
 9920/20000 [=============>................] - ETA: 1s - loss: 0.0096 - acc: 0.9986
10432/20000 [==============>...............] - ETA: 0s - loss: 0.0096 - acc: 0.9986
10912/20000 [===============>..............] - ETA: 0s - loss: 0.0094 - acc: 0.9986
11392/20000 [================>.............] - ETA: 0s - loss: 0.0095 - acc: 0.9986
11904/20000 [================>.............] - ETA: 0s - loss: 0.0094 - acc: 0.9987
12384/20000 [=================>............] - ETA: 0s - loss: 0.0092 - acc: 0.9987
12896/20000 [==================>...........] - ETA: 0s - loss: 0.0093 - acc: 0.9987
13408/20000 [===================>..........] - ETA: 0s - loss: 0.0093 - acc: 0.9987
13888/20000 [===================>..........] - ETA: 0s - loss: 0.0093 - acc: 0.9986
14368/20000 [====================>.........] - ETA: 0s - loss: 0.0093 - acc: 0.9987
14848/20000 [=====================>........] - ETA: 0s - loss: 0.0093 - acc: 0.9987
15328/20000 [=====================>........] - ETA: 0s - loss: 0.0092 - acc: 0.9987
15776/20000 [======================>.......] - ETA: 0s - loss: 0.0092 - acc: 0.9987
16256/20000 [=======================>......] - ETA: 0s - loss: 0.0093 - acc: 0.9986
16736/20000 [========================>.....] - ETA: 0s - loss: 0.0095 - acc: 0.9986
17184/20000 [========================>.....] - ETA: 0s - loss: 0.0095 - acc: 0.9985
17664/20000 [=========================>....] - ETA: 0s - loss: 0.0098 - acc: 0.9985
18112/20000 [==========================>...] - ETA: 0s - loss: 0.0103 - acc: 0.9985
18304/20000 [==========================>...] - ETA: 0s - loss: 0.0105 - acc: 0.9984
18752/20000 [===========================>..] - ETA: 0s - loss: 0.0104 - acc: 0.9985
19232/20000 [===========================>..] - ETA: 0s - loss: 0.0104 - acc: 0.9984
19648/20000 [============================>.] - ETA: 0s - loss: 0.0103 - acc: 0.9984
20000/20000 [==============================] - 3s 129us/sample - loss: 0.0104 - acc: 0.9984 - val_loss: 0.4613 - val_acc: 0.8576
Epoch 10/10

   32/20000 [..............................] - ETA: 2s - loss: 0.0095 - acc: 1.0000
  576/20000 [..............................] - ETA: 1s - loss: 0.0042 - acc: 1.0000
 1056/20000 [>.............................] - ETA: 1s - loss: 0.0050 - acc: 0.9991
 1536/20000 [=>............................] - ETA: 1s - loss: 0.0041 - acc: 0.9993
 1984/20000 [=>............................] - ETA: 1s - loss: 0.0044 - acc: 0.9990
 2432/20000 [==>...........................] - ETA: 1s - loss: 0.0042 - acc: 0.9992
 2944/20000 [===>..........................] - ETA: 1s - loss: 0.0041 - acc: 0.9993
 3456/20000 [====>.........................] - ETA: 1s - loss: 0.0040 - acc: 0.9994
 3968/20000 [====>.........................] - ETA: 1s - loss: 0.0039 - acc: 0.9995
 4480/20000 [=====>........................] - ETA: 1s - loss: 0.0040 - acc: 0.9996
 4992/20000 [======>.......................] - ETA: 1s - loss: 0.0038 - acc: 0.9996
 5504/20000 [=======>......................] - ETA: 1s - loss: 0.0038 - acc: 0.9996
 6016/20000 [========>.....................] - ETA: 1s - loss: 0.0039 - acc: 0.9997
 6496/20000 [========>.....................] - ETA: 1s - loss: 0.0037 - acc: 0.9997
 7008/20000 [=========>....................] - ETA: 1s - loss: 0.0037 - acc: 0.9997
 7488/20000 [==========>...................] - ETA: 1s - loss: 0.0037 - acc: 0.9997
 7904/20000 [==========>...................] - ETA: 1s - loss: 0.0037 - acc: 0.9997
 8320/20000 [===========>..................] - ETA: 1s - loss: 0.0038 - acc: 0.9995
 8800/20000 [============>.................] - ETA: 1s - loss: 0.0039 - acc: 0.9995
 9312/20000 [============>.................] - ETA: 1s - loss: 0.0040 - acc: 0.9995
 9824/20000 [=============>................] - ETA: 1s - loss: 0.0040 - acc: 0.9995
10304/20000 [==============>...............] - ETA: 1s - loss: 0.0040 - acc: 0.9995
10816/20000 [===============>..............] - ETA: 0s - loss: 0.0039 - acc: 0.9995
11168/20000 [===============>..............] - ETA: 0s - loss: 0.0039 - acc: 0.9996
11648/20000 [================>.............] - ETA: 0s - loss: 0.0043 - acc: 0.9993
12128/20000 [=================>............] - ETA: 0s - loss: 0.0042 - acc: 0.9993
12640/20000 [=================>............] - ETA: 0s - loss: 0.0042 - acc: 0.9994
13152/20000 [==================>...........] - ETA: 0s - loss: 0.0042 - acc: 0.9994
13632/20000 [===================>..........] - ETA: 0s - loss: 0.0042 - acc: 0.9994
14144/20000 [====================>.........] - ETA: 0s - loss: 0.0049 - acc: 0.9994
14624/20000 [====================>.........] - ETA: 0s - loss: 0.0049 - acc: 0.9994
15104/20000 [=====================>........] - ETA: 0s - loss: 0.0049 - acc: 0.9994
15584/20000 [======================>.......] - ETA: 0s - loss: 0.0051 - acc: 0.9993
16064/20000 [=======================>......] - ETA: 0s - loss: 0.0051 - acc: 0.9993
16544/20000 [=======================>......] - ETA: 0s - loss: 0.0052 - acc: 0.9992
17024/20000 [========================>.....] - ETA: 0s - loss: 0.0051 - acc: 0.9992
17504/20000 [=========================>....] - ETA: 0s - loss: 0.0052 - acc: 0.9992
17984/20000 [=========================>....] - ETA: 0s - loss: 0.0051 - acc: 0.9992
18496/20000 [==========================>...] - ETA: 0s - loss: 0.0051 - acc: 0.9992
19008/20000 [===========================>..] - ETA: 0s - loss: 0.0051 - acc: 0.9992
19520/20000 [============================>.] - ETA: 0s - loss: 0.0051 - acc: 0.9992
20000/20000 [==============================] - 3s 126us/sample - loss: 0.0053 - acc: 0.9991 - val_loss: 0.5033 - val_acc: 0.8566

YOUR TURN!

You may have noticed that we didn’t add any additional hidden layers to the densely connected portion of our model. Go ahead and add 1 or 2 more hidden layers. Also experiment with different word embedding dimensions (output_dim) and see if you can improve model performance.

yourturn_model <- keras_model_sequential() %>%
  layer_embedding(
    input_dim = _____,  
    input_length = _____,   
    output_dim = _____           
    ) %>%  
  layer_flatten() %>% 
  layer_dense(units = ____, activation = ____) %>%
  layer_dense(units = 1, activation = "sigmoid")

yourturn_model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)

yourturn_results <- yourturn_model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.2
)

Comparing embeddings

Recall that the word embeddings we found for natural language modeling created results like:

# natural language modeling embeddings
get_similar_words("horrible", word_embeddings)
 horrible  terrible     awful       bad    acting 
1.0000000 0.9350510 0.8963729 0.7977457 0.7829555 

However, embeddings we find for classification tasks are not always so clean and intuitive. We can get the word embeddings from our classification model with:

wts <- get_weights(model)
embedding_wts <- wts[[1]]

The following just does some bookkeeping to extract the applicable words and assign them as row names to the embedding matrix.

words <- tokenizer$word_index %>% 
  as_tibble() %>% 
  pivot_longer(everything(), names_to = "word", values_to = "id") %>%
  filter(id <= tokenizer$num_words) %>%
  arrange(id)

row.names(embedding_wts) <- words$word

The following is on of the custom functions you imported from the helper_functions.R file. You can see the word embeddings that most closely align to a given word are not as intuitive as those produced from the natural language model. However, these are the embeddings that optimized for the classification procedure at hand.

similar_classification_words("horrible", embedding_wts)
 horrible  costumes   express    dealer      mere    moving 
1.0000000 0.8079216 0.8072482 0.8031378 0.8018905 0.7940668 

Here’s a handy sequence of code that uses the t-SNE methodology to visualize nearest neighbor word embeddings.

# plotting too many words makes the output hard to read
n_words_to_plot <- 1000

tsne <- Rtsne::Rtsne(
  X = embedding_wts[1:n_words_to_plot,], 
  perplexity = 100, 
  pca = FALSE
  )

p <- tsne$Y %>%
  as.data.frame() %>%
  mutate(word = row.names(embedding_wts)[1:n_words_to_plot]) %>%
  ggplot(aes(x = V1, y = V2, label = word)) + 
  geom_text(size = 3)

plotly::ggplotly(p)
---
title: "NLP: Word embeddings"
output: html_notebook
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

In this example, we are going to learn an about an alternative method to encode 
text data known as ___word embeddings___. This is an incomplete tutorial on 
word embeddings but will at least give you the basic understanding on when and 
why we use them.

Learning objectives:

- What word embeddings are.
- The two main contexts that word embeddings are trained.
- When we should use word embeddings.
- How to train word embeddings for classification purposes.

# Requirements

```{r, message=FALSE}
# Initialize package
library(keras)
library(fs)
library(tidyverse)
library(glue)
library(progress)

# helper functions we'll use to explore word embeddings
source("helper_functions.R")
```

# The "real" IMBD dataset

So far, we've been using the built-in IMBD dataset. Here, we are going to use 
the original data files which can be found at http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz. We have already
downloaded this data for you.

```{r}
imdb_dir <- here::here("docs", "data", "imdb")
fs::dir_tree(imdb_dir, type = "directory")
```

You can see the data have already been separated into test vs training sets and 
positive vs negative sets. The actual reviews are contained in individual .txt
files. We can use this structure to our advantage - the below iterates over each
review and 

1. creates the path to each individual review file,
2. creats a label based on the "neg" or "pos" folder the review is in,
3. and saves the output as a data frame with each review on an individual row.

```{r}
training_files <- file.path(imdb_dir, "train") %>%
  dir_ls() %>%
  map(dir_ls) %>%
  set_names(basename) %>%
  plyr::ldply(data_frame) %>%
  set_names(c("label", "path"))

training_files
```

We can see our response observations are balanced:

```{r}
count(training_files, label)
```

We can now iterate over each row and

1. save the label in a label vector,
2. import the movie review and
3. save in a texts vector.

```{r}
obs <- nrow(training_files)
labels <- vector(mode = "integer", length = obs)
texts <- vector(mode = "character", length = obs)

# this just allows us to track progress of our loop
pb <- progress_bar$new(total = obs, width = 60)

for (file in seq_len(obs)) {
  pb$tick()
  
  label <- training_files[[file, "label"]]
  path <- training_files[[file, "path"]]
  
  labels[file] <- ifelse(label == "neg", 0, 1)
  texts[file] <- readChar(path, nchars = file.size(path)) 
  
}
```

We now have two vectors, one consisting of the labels and the other holding each
review.

```{r}
table(labels)

cat("\n")

texts[1]
```

# Exploratory text analysis

A little exploratory analysis will show us the total number of unique words
across our corpus and the average length of each review.

```{r, fig.height=3.5}
text_df <- texts %>%
  tibble(.name_repair = ~ "text") %>%
  mutate(text_length = str_count(text, "\\w+"))

unique_words <- text_df %>%
  tidytext::unnest_tokens(word, text) %>%
  pull(word) %>%
  n_distinct()

avg_review_length <- median(text_df$text_length, na.rm = TRUE)
  
ggplot(text_df, aes(text_length)) +
  geom_histogram(bins = 100, fill = "grey70", color = "grey40") +
  geom_vline(xintercept = avg_review_length, color = "red", lty = "dashed") +
  scale_x_log10("# words") +
  ggtitle(glue("Median review length is {avg_review_length} words"),
          subtitle = glue("Total number of unique words is {unique_words}"))
```


# Word embeddings for language modeling

Word embeddings are designed to encode general semantic relationships which can
serve two principle purposes. The first is for ___language modeling___ which 
aims to encode words for the purpose of predicting synonyms, sentence completion, 
and word relationships.

See [slides]() for more discussion of this type of modeling. We are not focusing
on word embeddings for this purpose; however, I have written a couple helper 
functions to train word embeddings for this purpose. See the code behind these 
helper functions [here](https://bit.ly/32HCP1G).

```{r}
# clean up text and compute word embeddings
clean_text <- tolower(texts) %>%
  str_replace_all(pattern = "[[:punct:] ]+", replacement = " ") %>%
  str_trim()

word_embeddings <- get_embeddings(clean_text)
```

Explore your own words!

```{r}
# find words with similar embeddings
get_similar_words("horrible", word_embeddings)
```


# Word embeddings for classification

The other principle purpose for word embeddings is to encode text for 
classification reasons. In this case, we train the word embeddings to take on 
weights that optimize the classification loss function. 

See [slides]() for more discussion of this type of modeling.

## Prepare data

To prepare our data we need to convert or `labels` vector to a tensor:

```{r}
labels <- as.array(labels)
```

But more importantly, we need to preprocess our text features. To do so we:

1. Specify how many words we want to include. This will capture the 10,000 words
   with the highest usage (frequency).
2. Create a `text_tokenizer` object which defines how we want to preprocess the
   text (i.e. convert to lowercase, remove punctuation, token splitting 
   characters). For the most part, the defaults are sufficient.
3. Apply the tokenizer to our text with `fit_text_tokenizer`. This results in an
   object with many details of our corpus (i.e. word counts, word index).

```{r}
top_n_words <- 10000

tokenizer <- text_tokenizer(num_words = top_n_words) %>% 
  fit_text_tokenizer(texts)

names(tokenizer)
```

```{r}
total_word_index <- tokenizer$word_index
num_words_used <- tokenizer$num_words

glue("We have now tokenized our reviews. ", "We are considering {num_words_used} ",
     "of {length(total_word_index)} total unique words. The most common words ",
     "include:")
head(total_word_index)
```


Next, we extract our vectorized review data as a list. This looks familiar from 
the earlier modules.

```{r}
sequences <- texts_to_sequences(tokenizer, texts)

# The vectorized first instance:
sequences[[1]]
```

We can see how our tokenizer converted our original text to a cleaned up 
version:

```{r} 
cat(crayon::blue("Original text:\n"))
texts[[1]]

cat(crayon::blue("\nRevised text:\n"))
paste(unlist(tokenizer$index_word)[sequences[[1]]] , collapse = " ")
```

Next, since each review is a different length, we need to limit ourselves to a
certain number of words so that all our features (reviews) are the same length. 

Note (`?pad_sequences`):
* Any reviews that are shorter than this length will be padded.
* Any reviews that are longer than this length will be truncated.

```{r}
max_len <- 150
features <- pad_sequences(sequences, maxlen = max_len)
```

```{r}
features[1,]
```

```{r}
paste(unlist(tokenizer$index_word)[features[1,]], collapse = " ")
```

### Your Turn!

Check out different reviews and see how we have transformed the data. Remove 
`eval=FALSE` to run.

```{r, eval=FALSE}
# use review number (i.e. 2, 10, 150)
which_review <- ____
  
cat(crayon::blue("Original text:\n"))
texts[[which_review ]]

cat(crayon::blue("\nRevised text:\n"))
paste(unlist(tokenizer$index_word)[features[which_review ,]] , collapse = " ")

cat(crayon::blue("\nEncoded text:\n"))
features[which_review ,]
```


Our data is now preprocessed! We have `r nrow(features)` observations and 
`r ncol(features)` features.

```{r}
dim(features)
dim(labels)
```


## Model training

To train our model we will use the `validation_split` procedure within `fit`. 
Remember, this takes the last XX% of our data to be used as our validation set. 
But if you recall, our data was organized in neg and pos folders so we should 
randomize our data to make sure our validation set doesn't end up being all 
positive or negative reviews!

```{r}
set.seed(123)
index <- sample(1:nrow(features))

x_train <- features[index, ]
y_train <- labels[index]
```

To create our network architecture that includes word embeddings, need to 
include two things:

1. `layer_embedding` layer that creates the embeddings,
2. `layer_flatten` to flatten our embeddings to a 2D tensor for our densely 
    connected portion of our model

```{r}
model <- keras_model_sequential() %>%
  layer_embedding(
    input_dim = top_n_words,  # number of words we are considering
    input_length = max_len,   # length that we have set each review to
    output_dim = 32            # length of our word embeddings
    ) %>%  
  layer_flatten() %>% 
  layer_dense(units = 1, activation = "sigmoid")

summary(model)
```

The rest of our modeling procedures follows the same protocols that you've seen 
in the other modules.

```{r}
model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)

history <- model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.2
)
```

## YOUR TURN!

You may have noticed that we didn't add any additional hidden layers to the 
densely connected portion of our model.  Go ahead and add 1 or 2 more hidden 
layers.  Also experiment with different word embedding dimensions (`output_dim`) 
and see if you can improve model performance.

```{r, eval=FALSE}
yourturn_model <- keras_model_sequential() %>%
  layer_embedding(
    input_dim = _____,  
    input_length = _____,   
    output_dim = _____           
    ) %>%  
  layer_flatten() %>% 
  layer_dense(units = ____, activation = ____) %>%
  layer_dense(units = 1, activation = "sigmoid")

yourturn_model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)

yourturn_results <- yourturn_model %>% fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 32,
  validation_split = 0.2
)
```

## Comparing embeddings

Recall that the word embeddings we found for natural language modeling created 
results like:

```{r}
# natural language modeling embeddings
get_similar_words("horrible", word_embeddings)
```

However, embeddings we find for classification tasks are not always so clean and 
intuitive. We can get the word embeddings from our classification model with:

```{r}
wts <- get_weights(model)
embedding_wts <- wts[[1]]
```

The following just does some bookkeeping to extract the applicable words and 
assign them as row names to the embedding matrix.

```{r}
words <- tokenizer$word_index %>% 
  as_tibble() %>% 
  pivot_longer(everything(), names_to = "word", values_to = "id") %>%
  filter(id <= tokenizer$num_words) %>%
  arrange(id)

row.names(embedding_wts) <- words$word
```

The following is on of the custom functions you imported from the 
[helper_functions.R](https://bit.ly/32HCP1G) file. You can see the word 
embeddings that most closely align to a given word are not as intuitive as those
produced from the natural language model. However, these are the embeddings that
optimized for the classification procedure at hand.

```{r}
similar_classification_words("horrible", embedding_wts)
```
 
Here's a handy sequence of code that uses the [t-SNE](https://bit.ly/2rDk6rs) 
methodology to visualize nearest neighbor word embeddings.

 
```{r}
# plotting too many words makes the output hard to read
n_words_to_plot <- 1000

tsne <- Rtsne::Rtsne(
  X = embedding_wts[1:n_words_to_plot,], 
  perplexity = 100, 
  pca = FALSE
  )

p <- tsne$Y %>%
  as.data.frame() %>%
  mutate(word = row.names(embedding_wts)[1:n_words_to_plot]) %>%
  ggplot(aes(x = V1, y = V2, label = word)) + 
  geom_text(size = 3)

plotly::ggplotly(p)
```
 
 